Multivariate Time Series Prediction via Temporal Classification
نویسندگان
چکیده
One of the important problems in many process industries is how to predict the occurrence of abnormal situations ahead of time in a multivariate time series environment. For example, in an oil refinery, hundreds of sensors (process variables) are installed at different sections of a process unit. These sensors constantly monitor the development of every stage of the process. Typically, each process variable (represented by a sensor) is given some value ranges, e.g., low, normal, and high. When the value of a variable reaches an undesirable range, e.g., low or high, an alarm is generated and displayed on a computer screen. The human operator, who monitors alarms, uses his/her experiences to decide what alarms are important and what actions to take. This manual process is acceptable for those non-critical variables. When alarms from these variables are generated, it is not too late to react. However, there are some critical variables that can significantly affect the quality of the product or indicate danger in the process. When alarms are generated from these variables, it is often too late to perform corrective actions. This can result in poor product quality, or even shutdown of the whole plant to prevent fire or explosion. It is thus of great importance to predict when these critical variables will reach undesirable ranges ahead of time. By doing so, the plant engineers can have sufficient time to take preventive actions. Traditionally, engineers and researchers tried to build mathematical models to understand the physical process and to capture the temporal relationships among the variables. However, in a typical plant, there are hundreds of variables, and their interactions are extremely complicated. Building an accurate mathematical model is very difficult, if not impossible. So far, little success has been achieved. Although time series forecasting techniques exist in statistics, they are mostly for a single time series. Those that work with multiple time series often do not produce satisfactory results. In the past few years, there were a number of reported studies in data mining on pattern discovery, and pattern matching in time series data. However, discovered patterns are not necessarily useful for prediction. Limited work has been done on how to build predictive models directly from multivariate time series. In this paper, we study a special form of time series prediction, i.e., the prediction or dependent variable taking discrete values. Although in a real application, this variable may take numeric values, the users are usually only interested in its value ranges, e.g., normal or abnormal, not its actual values. This problem is related to traditional classification. In traditional classification, a classifier is built from the training data, which is then used to predict the class of a new data object. Many successful techniques were produced in the past. However, these techniques cannot be directly applied to time series data because they are unable to consider time in their model building processes. In this work, we extended two traditional classification techniques, namely, naïve Bayesian classifier [1] and decision trees [3] to suit temporal prediction. This results in two new techniques, temporal naïve Bayesian model (T-NB), and temporal decision tree (T-DT). T-NB and T-DT have been tested on 7 real-life datasets from an oil refinery. Experiment results show that they perform very accurate predictions. Our contributions: To the best of our knowledge, this is the first time that the two classic classification techniques have been extended for time series prediction. Unlike traditional classification, the key issue in temporal predication is that one variable’s fluctuation often causes changes in other variables after certain periods of time, i.e., with some time lags or delays. Thus, we augment the existing classification methods with a temporal component, which is able to find a suitable prediction time lag (or delay) for each variable at different stages of the model building. We have tested the proposed T-NB and T-DT methods and compared them with existing techniques using 7 real-life datasets. The results show that they perform far better predictions than existing statistical and other types of techniques. The results also show that the T-NB method produces more accurate predictors than the T-DT method. The T-NB method is also very efficient. It only scans the data twice, and allows the data to reside on disk. Further details can be found in [2].
منابع مشابه
Evaluation of Univariate, Multivariate and Combined Time Series Model to Prediction and Estimation the Mean Annual Sediment (Case Study: Sistan River)
Erosion, sediment transport and sediment estimate phenomenon with their damage in rivers is a one of the most importance point in river engineering. Correctly modeling and prediction of this parameter with involving the river flow discharge can be most useful in life of hydraulic structures and drainage networks. In fact, using the multivariate models and involving the effective other parameter...
متن کاملReview: Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data
Iyad Batal et. al. in the paper ”Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data” proposed a pattern mining approach for multivariate health data time series which is then used for classification and prediction of diseases. To extract the patterns, they assigned a fuzzy value in time intervals instead of numerical values for each variable. Then, they concate...
متن کاملMultivariate Feature Extraction for Prediction of Future Gene Expression Profile
Introduction: The features of a cell can be extracted from its gene expression profile. If the gene expression profiles of future descendant cells are predicted, the features of the future cells are also predicted. The objective of this study was to design an artificial neural network to predict gene expression profiles of descendant cells that will be generated by division/differentiation of h...
متن کاملMultivariate Feature Extraction for Prediction of Future Gene Expression Profile
Introduction: The features of a cell can be extracted from its gene expression profile. If the gene expression profiles of future descendant cells are predicted, the features of the future cells are also predicted. The objective of this study was to design an artificial neural network to predict gene expression profiles of descendant cells that will be generated by division/differentiation of h...
متن کاملLeveraging Patient Similarity and Time Series Data in Healthcare Predictive Models
Patient time series classification faces challenges in high degrees of dimensionality and missingness. In light of patient similarity theory, this study explores effective temporal feature engineering and reduction, missing value imputation, and change point detection methods that can afford similarity-based classification models with desirable accuracy enhancement. We select a piecewise aggreg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002